Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective
نویسندگان
چکیده
In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk aversion situation expectation of accumulated rewards is not robust enough, this is the case when distribution of accumulated reward is heavily skewed; another issue is that many applications naturally take several objective into consideration when evaluating a policy, for instance in autonomous driving an agent needs to balance speed and safety when choosing appropriate decision. In this paper, we consider evaluating a policy based on a sequence of quantiles it induces on a set of target states, our idea is to reformulate the original problem into a multi-objective MDP problem with lexicographic preference naturally defined. For computation of finding an optimal policy, we proposed an algorithm FLMDP that could solve general multi-objective MDP with lexicographic reward preference.
منابع مشابه
Constrained consumable resource allocation in alternative stochastic networks via multi-objective decision making
Many real projects complete through the realization of one and only one path of various possible network paths. Here, these networks are called alternative stochastic networks (ASNs). It is supposed that the nodes of considered network are probabilistic with exclusive-or receiver and exclusive-or emitter. First, an analytical approach is proposed to simplify the structure of t...
متن کاملA multiple objective approach for joint ordering and pricing planning problem with stochastic lead times
The integration of marketing and demand with logistics and inventories (supply side of companies) may cause multiple improvements; it can revolutionize the management of the revenue of rental companies, hotels, and airlines. In this paper, we develop a multi-objective pricing-inventory model for a retailer. Maximizing the retailer's profit and the service level are the objectives, and shorta...
متن کاملSolving matrix games with hesitant fuzzy pay-offs
The objective of this paper is to develop matrix games with pay-offs of triangular hesitant fuzzy elements (THFEs). To solve such games, a new methodology has been derived based on the notion of weighted average operator and score function of THFEs. Firstly, we formulate two non-linear programming problems with THFEs. Then applying the score function of THFEs, we transform these two problems in...
متن کاملA New Method For Solving Linear Bilevel Multi-Objective Multi-Follower Programming Problem
Linear bilevel programming is a decision making problem with a two-level decentralized organization. The leader is in the upper level and the follower, in the lower level. This study addresses linear bilevel multi-objective multi-follower programming (LB-MOMFP) problem, a special case of linear bilevel programming problems with one leader and multiple followers where each decision maker has sev...
متن کاملSolving Critical Path Problem in Project Network by a New Enhanced Multi-objective Optimization of Simple Ratio Analysis Approach with Interval Type-2 Fuzzy Sets
Decision making is an important issue in business and project management that assists finding the optimal alternative from a number of feasible alternatives. Decision making requires adequate consideration of uncertainty in projects. In this paper, in order to address uncertainty of project environments, interval type-2 fuzzy sets (IT2FSs) are used. In other words, the rating of each alternativ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.03597 شماره
صفحات -
تاریخ انتشار 2017